Improving speech synthesis quality by reducing pitch peaks in the source recordings

نویسندگان

Luisina Violante

Pablo Rodríguez Zivic

Agustín Gravano

چکیده

We present a method for improving the perceived naturalness of corpus-based speech synthesizers. It consists in removing pronounced pitch peaks in the original recordings, which typically lead to noticeable discontinuities in the synthesized speech. We perceptually evaluated this method using two concatenative and two HMM-based synthesis systems, and found that using it on the source recordings managed to improve the naturalness of the synthesizers and had no effect on their intelligibility.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Maximum-likelihood dynamic intonation model for concatenative text-to-speech system

In this work we present a Maximum Likelihood (ML) joint pitch curve modeling, inspired by HMM TTS synthesis concept. This model provides an optimal solution for the coarse target intonation curve (3 points per syllable) and incorporates both static and dynamic pitch values for better utterance intonation modeling. The coarse intonation curve may be optionally combined with the original pitch ex...

متن کامل

Quality improvement of PSOLA analysis-synthesis using partial zero-phase conversion

This paper discusses two issues of the quality improvement of F0 modified speech based upon PSOLA analysissynthesis. Previous studies[1][2] pointed out that the location of a window of PSOLA influences the quality of synthesized speech and one of them claimed that the center of a window should be located at a pitch pulse in source waveforms. However, pitch pulse detection sometimes fails due to...

متن کامل

On Reducing the Buzz in Lpc Synthesis

A method for reducing the characteristic buzz from LPC synthetic speech is presented. The method consists of the use of an non-impulse source for exciting the LPC synthesizer during voiced sounds. One novel feature is that the temporal parameters of the source are kept in fixed proportion to. the pitch period. An extensive perceptual experiment has shown that the resulting quality of the synthe...

متن کامل

An Algorithm for Locating Fundamental Frequency (f0) Markers in Speech

AN ALGORITHM FOR LOCATING FUNDAMENTAL FREQUENCY (F0) MARKERS IN SPEECH Princy Dikshit Old Dominion University, December 2004 Director: Dr. Stephen A. Zahorian Speech has been the principal form of human communication since it began to evolve at least one hundred thousand years ago. Speech is produced by vibrations of the vocal cords. The rate of vibration of the cords is called fundamental freq...

متن کامل

Enhancement of electrolaryngeal speech by spectral subtraction, spectral compensation, and introduction of jitter and shimmer

An electrolarynx, a verbal communication aid used by laryngectomy patients, is a vibrator held against the neck tissue to provide excitation to the vocal tract, as a substitute to that provided by the glottal vibrations. Although the user can set the vibration level and pitch, a dynamic control of level, voicing, and pitch during speech production is not feasible. In addition to this basic limi...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Improving speech synthesis quality by reducing pitch peaks in the source recordings

نویسندگان

چکیده

منابع مشابه

Maximum-likelihood dynamic intonation model for concatenative text-to-speech system

Quality improvement of PSOLA analysis-synthesis using partial zero-phase conversion

On Reducing the Buzz in Lpc Synthesis

An Algorithm for Locating Fundamental Frequency (f0) Markers in Speech

Enhancement of electrolaryngeal speech by spectral subtraction, spectral compensation, and introduction of jitter and shimmer

عنوان ژورنال:

اشتراک گذاری